Structural and relational data mining for systems biology applications

نویسنده

  • Elisabeth Georgii
چکیده

Due to the enormous accumulation of experimental data and the increasing need for combining heterogeneous data sources, the field of systems biology yields novel and very interesting problems in data analysis. The development of high-throughput technologies has opened the possibility to study the behavior of many cellular components simultaneously. Therefore, there is an increasing interest and effort in not only understanding the functions of single isolated components, but also revealing the interactions and functional relationships between different components. Often, the outcome of large-scale measurements is conveniently represented in a structured form; prominent examples are proteinprotein interaction networks, coexpression networks for genes, and bipartite graphs of associations between experimental conditions and regulated genes. This thesis presents different methods that aim at finding interesting patterns in such data. The main contributions are as follows. First, an exact enumerative approach to dense cluster detection is proposed. Given a weighted interaction network and a default weight for missing edges, the density of a node set is defined as the average pairwise interaction weight. The described method finds all patterns that satisfy a user-defined minimum density threshold. Conceptually, this task is a generalization of clique search; however, the standard techniques to solve that problem are not appropriate for the generalized question. Fortunately, an efficient enumeration strategy can be achieved by adopting the reverse search paradigm. Remarkably, the same algorithmic framework is applicable to discover cluster patterns in other types of structured data, like asymmetric binary relations and multipartite graphs, as well as hypergraphs, n-ary relations, and tensors. Second, our approach integrates additional constraints in order to focus the search on clusters that are relevant for the specific application at hand. For example, if each node in a network has an annotation profile attached to it, we can identify dense clusters where the nodes share a common subprofile. The principal idea is that the user provides the datasets of interest and defines desired properties of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Developing Tightly-Coupled Data Mining Applications on a Relational Database System

We present a methodology for tightly coupling data mining applications to database systems to build high-performance applications, without requiring any change to the database software.

متن کامل

Mining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM

Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...

متن کامل

C 5 . 8 Spatial Analysis

C 5.8 Spatial Analysis Martin Ester Preliminary Draft of August 1, 1998 Both, the number and the size of spatial databases are rapidly growing in applications such as geomarketing, astrophysics and molecular biology. This is mainly due to the amazing progress in scientific instruments such as satellites with remote sensors, telescopes or X-ray crystallography. While a lot of algorithms have bee...

متن کامل

Practical Applications of Data Mining

Despite the undoubted influence, technologies have made a tremendous change in the field of bioinformatics and other related areas. Extensive research is still being carried out on fundamentals of data mining in genomics and proteomics addresses about the recent research developments which really depends on the analysis and interpretation of large amounts of data generated by high-throughput te...

متن کامل

Relational XES: Data Management for Process Mining

Information systems log data during the execution of business processes in so called “event logs”. Process mining aims to improve business processes by extracting knowledge from event logs. Currently, the de-facto standard for storing and managing event data, XES, is tailored towards sequential access of this data. Handling more and more data in process mining applications is an important chall...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010